Assessing Text Readability Using Hierarchical Lexical Relations Retrieved from WordNet
نویسندگان
چکیده
Although some traditional readability formulas have shown high predictive validity in the r = 0.8 range and above (Chall & Dale, 1995), they are generally not based on genuine linguistic processing factors, but on statistical correlations (Crossley et al., 2008). Improvement of readability assessment should focus on finding variables that truly represent the comprehensibility of text as well as the indices that accurately measure the correlations. In this study, we explore the hierarchical relations between lexical items based on the conceptual categories advanced from Prototype Theory (Rosch et al., 1976). According to this theory and its development, basic level words like guitar represent the objects humans interact with most readily. They are acquired by children earlier than their superordinate words like stringed instrument and their subordinate words like acoustic guitar. Accordingly, the readability of a text is presumably associated with the ratio of basic level words it contains. WordNet (Fellbaum, 1998), a network of meaningfully related words, provides the best online open source database for studying such lexical relations. Our study shows that a basic level noun can be identified by its ratio of forming compounds (e.g. chair armchair) and the length difference in relation to its hyponyms. We compared graded readings for American children and high school English readings for Taiwanese students by several readability formulas and in terms of basic level noun ratios (i.e. the number of basic level noun types divided by the number of noun types in a text ). It is suggested that basic level noun ratios provide a robust and meaningful index of lexical complexity, which is directly associated with text readability. Department of English, National Taiwan Normal University E-mail: {yenyenet, vennysu, yudalai, lchyang1112, shukai}@gmail.com 46 Shu-Yen Lin et al.
منابع مشابه
Measuring Text Readability by Lexical Relations Retrieved from Wordnet
Current readability formulae have often been criticized for being unstable or not valid. They are mostly computed in regression analysis based on intuitively-chosen variables and graded readings. This study explores the relation between text readability and the conceptual categories proposed in Prototype Theory. These categories form a hierarchy: Basic level words like guitar represent the obje...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملOne Lexicon, Two Structures: So What Gives?
We present a reinterpretation of lexical information embedded in the English WordNet in an alternate type of structure called lexical system. First, we characterize lexical systems as graphs of lexical units (word senses) connected mainly by Meaning-Text lexical function relations, then introduce a hand-built lexical system: the French Lexical Network or frLN, a lexical resource that implements...
متن کاملCreation of Lexical Relations for IndoWordNet
WordNet is an electronic lexical database available on-line as a powerful resource to the researchers in the area of computational linguistics, text processing and other related areas. WordNet for Hindi language has already been developed by IIT, Bombay. The Indian languages WordNets are being created using expansion approach from Hindi WordNet under IndoWordNet project. In expansion approach, ...
متن کاملAssessing Chinese Readability using Term Frequency and Lexical Chain
This paper investigates the appropriateness of using lexical cohesion analysis to assess Chinese readability. In addition to term frequency features, we derive features from the result of lexical chaining to capture the lexical cohesive information, where E-HowNet lexical database is used to compute semantic similarity between nouns with high word frequency. Classification models for assessing ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 14 شماره
صفحات -
تاریخ انتشار 2009